Data Classification Policies
Data Classification policies detect the presence of specific sensitive data elements or narrowly defined sensitive content classes in text. Common examples include payment card data (PCI), protected health information (PHI), credentials, compensation data, employee review data, and organization-specific regulated identifiers.
When to use this policy type: Use Data Classification when you want to detect actual sensitive content in the text itself — whether entered directly by a user or extracted from a document. Do not use it as the first choice for blocking requests, allusions, or meta-discussion about a sensitive topic when no sensitive data is present. For request- or intent-based detection, use a Behavioral Content Policy instead.
Data Classification works best when the policy can be defined in concrete, observable terms. A strong policy clearly states:
- What data must be present in the text for a violation
- What should remain compliant
- What near-miss or lookalike cases should not be blocked
Recommended use cases
- Detecting actual credential material such as passwords, tokens, API keys, or private keys
- Detecting actual payment card numbers, CVVs, and related sensitive financial fields
- Detecting actual PHI or other regulated personal data elements
- Detecting organization-defined structured sensitive fields not covered by built-in PII detection, such as classified information markers or proprietary trade identifiers
Choosing the right policy type
| Data Classification | Behavioral Content | PII Detection | |
|---|---|---|---|
| Best for | Detecting actual sensitive content present in text | Detecting requests or intent, even when no sensitive data is present | Built-in detection of standard personal identifiers |
| Example trigger | "My card number is 4111 1111 1111 1111" | "What is John's SSN?" | Name, email, phone number, SSN |
| Supports redact / sanitize? | No | No | Yes |
| Custom scope? | Yes — define your own data boundary | Yes — define in natural language | Extensible via regex or blocklist |
Use Data Classification when the policy boundary is custom, domain-specific, or broader than standard PII, and you need a trained classifier to detect actual sensitive content — such as credentials, PCI, PHI, compensation data, or proprietary structured fields.
Use PII Detection when you need built-in detection for standard personal identifiers, especially when you need redact or sanitize behavior.
Use Behavioral Content Policy when what you want to block is the request or intent rather than the presence of sensitive data itself — for example, "Can you look up the CVV on file?" when no actual CVV is present.
Current platform notes
- Data Classification is created through the Custom Content policy flow.
- In the current UI, Data Classification is configured as an input policy.
- Generate Behaviors is not available for Data Classification — allowed and prohibited behaviors must be entered manually.
- If you need redact or sanitize behavior, use PII Detection instead.